Bayesian Based Comment Spam Defending Tool

نویسندگان

  • Dhinaharan Nagamalai
  • Beatrice Cynthia Dhinakaran
  • Jae-Kwang Lee
چکیده

Spam messes up user’s inbox, consumes network resources and spread worms and viruses. Spam is flooding of unsolicited, unwanted e mail. Spam in blogs is called blog spam or comment spam.It is done by posting comments or flooding spams to the services such as blogs, forums,news,email archives and guestbooks. Blog spams generally appears on guestbooks or comment pages where spammers fill a comment box with spam words. In addition to wasting user’s time with unwanted comments, spam also consumes a lot of bandwidth. In this paper, we propose a software tool to prevent such blog spams by using Bayesian Algorithm based technique. It is derived from Bayes’ Theorem. It gives an output which has a probability that any comment is spam, given that it has certain words in it. With using our past entries and a comment entry , this value is obtained and compared with a threshold value to find if it exceeds the threshold value or not. By using this cocept, we developed a software tool to block comment spam. The experimental results show that the Bayesian based tool is working well. This paper has the major findings and their significance of blog spam filter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

متن کامل

A comparative study for content-based dynamic spam classification using four machine learning algorithms

The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naı̈ve Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is prese...

متن کامل

NEIGHBORWATCHER: A Content-Agnostic Comment Spam Inference System

Comment spam has become a popular means for spammers to attract direct visits to target websites, or to manipulate search ranks of the target websites. Through posting a small number of spam messages on each victim website (e.g., normal websites such as forums, wikis, guestbooks, and blogs, which we term as spam harbors in this paper) but spamming on a large variety of harbors, spammers can not...

متن کامل

Library blogs and user participation: a survey about comment spam in library blogs

Purpose The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in library blogs; librarians' perception of comment spam; and techniques used to address the comment spam problem. Design/methodology/approach A quantitative approach is used to investigate research questions. Inform...

متن کامل

A New Approach to Spam Mail Detection

The ever increasing menace of spam is bringing down productivity. More than 70% of the email messages are spam, and it has become a challenge to separate such messages from the legitimate ones. I have developed a spam identification engine which employs naive Bayesian classifier to identify spam. A new concept-based mining model that analyzes terms on the sentence, document is introduced. . The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1011.3279  شماره 

صفحات  -

تاریخ انتشار 2010